Fast Identification of Stop Words for Font Learning and Keyword Spotting

نویسنده

  • Tin Kam Ho
چکیده

A recently proposed adaptive strategy for text recognition uses a linguistic fact that over half of the words on a typical English page are among 150 common stop words. The small lexicon permits word-shape based recognition that yields word identities from which character prototypes can be extracted. This paper describes a fast procedure for locating the best candidates for those stop words. The procedure uses width statistics of individual words and their immediate neighbors. In an experiment using 400 page images, the method removed 63% of the words from consideration. The stop/non-stop word discrimination also assists keyword spotting for information retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Access-based Confidence Measure for a Spanish Keyword Spotting System

Keyword spotting deals with the search of a reduced set of keywords in audio content. Phone Lattice-based approaches are very fast but achieve poor results. HMM-based keyword spotting systems deal with filler models to absorb the Out-of-vocabulary (OOV) words and achieve best results although they are slower. We propose a technique which combines them in order to perform a confidence measure to...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Keyword spotting in multi-player voice driven games for children

Word spotting, or keyword identification, is a highly challenging task when there are multiple speakers speaking simultaneously. In the case of a game being controlled by children solely through voice, the task becomes extremely difficult. Children, unlike adults, typically do not await their turn to speak in an orderly fashion. They interrupt and shout at arbitrary times, speak or say things t...

متن کامل

Session 3: Continuous Speech Recognition

The papers in this session focus on techniques for and applications of large-vocabulary continuous speech recognition. The technique oriented papers discuss techniques for channel compensation, fast search, acoustic modeling, and adaptive language modeling. The applications oriented papers discuss methods for using recognizers for language identification, speaker identification, speakersex iden...

متن کامل

Recognition and Rejection Performance in Wordspotting Systems Using Support Vector Machines

Support Vector Machines (SVM) is one such machine learning technique that learns the decision surface through a process of discrimination and has a good generalization capacity [6]. SVMs have been proven to be successful classifiers on several classical pattern recogntion problems [9, 11]. In this paper, one of the first applications of Support Vector Machines (SVM) technique for the problem of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999